install.packages(tibble)Today we will…
Hi, I’m Dr. Rehnberg!
I am a transplant to the west coast – PA to MO to MI to CA.
My favorite things are being outside, drinking tea, and watching reality tv.
I am teaching this course for the first time – please bear with me as I get materials ready for Canvas.
I have a genetic, degenerative eye disease called Stargardt disease, which causes me to have poor vision, even with corrective lenses.
What this means for you:
When I am helping you on your computer, please make the font large and turn the brightness up.
I have difficulty recognizing faces – please be patient!
Questions?
I am looking forward to reading your introductions on Canvas Discussions!
R’s strengths are…
… handling data with lots of different types of variables.
… making nice and complex data visualizations.
… having cutting-edge statistical methods available to users.
R’s weaknesses are…
… performing non-analysis programming tasks, like website creation (python, ruby, …).
… hyper-efficient numerical computation (matlab, C, …).
… being a simple tool for all audiences (SPSS, STATA, JMP, minitab, …).
The heart and soul of R are packages.
To install a package use:
Importantly, R is open-source.
This means packages are created by users like you and me!
Being a good open-source citizen means…
… sharing your code publicly when possible (later in this course, we’ll learn about GitHub!).
… contributing to public projects and packages, as you are able.
… creating your own packages, if you can.
… using R for ethical and respectful projects.
A value is a basic unit of stuff that a program works with.
Values have types:
logical / boolean: FALSE/TRUE or 0/1 values.
integer: whole numbers.
double / float / numeric: decimal numbers.
float: “floating-point value”.
double: floating-point value with more precision.
numeric: decimal value, regardless of precision.
character / string - holds text, usually enclosed in quotes.
Variables are names that refer to values.
A variable is like a container that holds something - when you refer to the container, you get whatever is stored inside.
We assign values to variables using the syntax object_name <- value.
Homogeneous: every element has the same data type.
Vector: a one-dimensional column of homogeneous data.
Matrix: the next step after a vector - it’s a set of homogenous data arranged in a two-dimensional, rectangular format.
Heterogeneous: the elements can be of different types.
List: a one-dimensional column of heterogeneous data.
Dataframe: a two-dimensional set of heterogeneous data arranged in a rectangular format.
We use square brackets ([]) to access elements within data structures.
We can combine logical statements using and, or, and not.
(X AND Y) requires that both X and Y are true.
(X OR Y) requires that one of X or Y is true.
(NOT X) is true if X is false, and false if X is true.
RStudio is an IDE (Integrated Developer Environment).
RStudio was released in 2011 by J.J. Allaire.
In 2014, RStudio hired Hadley Wickham as Chief Data Scientist. They now employ around 20 full-time developers.
Recall: You can not sell R code, so packages created by RStudio’s team are freely available.
They make money off the IDE and other helper software.
In 2020, RStudio became a PBC (Public Benefit Corp), meaning they are legally obligated to support education and open-source development.
A directory is just a fancy name for a folder.
Your working directory is the folder that R “thinks” it lives in at the moment.
[1] "/Users/zrehnber/Library/CloudStorage/OneDrive-CalPoly/STAT 331:531/S23/W1"
This file lives in my user files Users/…
…on my account zrehnber/ …
…in my OneDrive OneDrive - Cal Poly …
…in a series of organized folders.
Create a directory for this class!
Is it in a place you can easily find it?
Does it have an informative name?
Are the files inside it well-organized?
An R Project is basically a “flag” planted in a certain directory.
When you double click a .Rproj file, it:
Opens RStudio
Sets the working directory to be wherever the .Rproj file lives.
Links to GitHub, if set up (more on that later!)
RStudio Projects are great for reproducibility!
You can send anyone your folder with your .Rproj file and they will be able to run your code on their computer.
We will be using RStudio Projects throughout this course.
You can to send your project to someone else, and they can jump in and start working right away.
This involves:
Files are organized and well-named.
References to data and code work for everyone.
Package dependency is clear.
Code will run the same every time, even if data values change.
Analysis process is well-explained and easy to read.
/User/zrehnber/Stat331/lab1/ rather than Desktop/stuff/If you put something like this at the top of your .qmd file (more on Quarto later), I will set your computer on fire:
Setting working directory by hand = BAD!
That directory is specific to you!
R Markdown and Quarto (more on these later) ignore this code when knitting!
seq(from = 1, to = 10, by = 1
Error: <text>:2:0: unexpected end of input
1: seq(from = 1, to = 10, by = 1
^
seq(from = 1, to = 10 by = 1)
sequence(from = 1, to = 10, by = 1)
Error in sequence.default(from = 1, to = 10, by = 1): argument "nvec" is missing, with no default
sqrt(‘1’)
Error in my_obj(5): could not find function "my_obj"
Just because you see scary red text, this does not mean something went wrong! This is just R communicating with you.
Often, R will give you a warning.
This means that your code did run…
…but you probably want to make sure it succeeded.
Does this look right?
If the word Error appears in your message from R, then you have a problem.
Error: Object
some_objnot found.
Error: Object of type ‘closure’ is not subsettable.
Error: Non-numeric argument to binary operator.
Look at the help file for the function!
When all else fails, Google your error message.
Leave out the specifics.
Include the function you are using.
What’s wrong here?
The components of the Practice Activity are described below:
Part One:
This file has many mistakes in the code. Some are errors that will prevent the file from knitting; some are mistakes that do NOT result in an error.
Fix all the problems in the code chunks.
Part Two:
Follow the instructions in the file to uncover a secret message.
Submit the name of the poem as the answer to the Canvas Quiz question.
Today we will…
Scripts are files of code that are meant to be run on their own.
File > New File > R Script
# indicates a comment
Scripts can be run in Rstudio by clicking the Run button at the top of the editor window when the script is open.
You can also run code interactively in a script by: + highlighting lines of code (or even just placing your cursor on a line of code) and hitting run or cntrl + enter
Notebooks are an implementation of literate programming.
Allows you to integrate code, output, text, images, etc. into a single document.
Reproducibility!
Markdown (without the “R”) is a markup language. This means special symbols and formatting to pretty documents.
Markdown files have the .md extension.
R Markdown (with the “R”) uses regular markdown, plus it can run and display R code. (Other languages, too!)
R Markdown files have the .Rmd extension.
##
Quarto unifies and extends the RMarkdown ecosystem.
Quarto is the next generation RMarkdown.
??? It unifies it for people who love R Markdown by reducing little points of friction, so that websites, books, and slides all have a common approach.
And it extends it for people who don’t know RMarkdown by being a friendly way to work reproducibly and publish documents that have text and code in the same place
Think of all the packages from the R Markdown universe that you’ve come to love and rely on over the years. Each package addresses a different need and offers a different output: for example, you can use blogdown or distill for creating websites and blogs, bookdown for writing a book, rticles for writing journal articles, etc.
Quarto unifies the functionality from these packages,
building on Pandoc on the technical side
and on the human side – over a decade of experience developing, maintaining, and tweaking these packages as well as community feedback R Markdown extension packages.
Such a unification effort presents a fantastic opportunity to identify and address gaps functionality and inefficiencies in user experience, and the process of building Quarto has been no different.
Consistent implementation of attractive and handy features across outputs: tabsets, code-folding, syntax highlighting, etc.
More accessible defaults as well as better support for accessibility
Guardrails, particularly helpful for new learning: YAML completion, informative syntax errors, etc.
Support for other languages like Python, Julia, Observable, and more via Jupyter engine for executable code chunks.
???
Some highlights of these improvements include
–> consistent implementation of attractive and handy features across all outputs, like tabsets, code-folding, and syntax highlighting,
–> more accessible defaults as well as better support for creating accessible documents,
–> guardrails that are particularly helpful for new learners, like YAML completion and informative syntax errors,
–> and perhaps most excitingly for those who are not coming from the R ecosystem, Quarto offers support for other languages like Python, Julia, Observable, and more via the Jupyter engine for executable code chunks. And it’s designed to be expandable to more engines and languages, even those that might not exist today.
[pause] So by now I’m assuming many of you have already started playing with Quarto, or you’re waiting for me to do that. So, let’s dive in and see what all the Quarto fuss is about!
Quarto CLI orchestrates each step of rendering
???
Now that you’ve all had a chance to see Quarto in action, you might be wondering, “how are R Markdown and Quarto different”?
The main difference is that with R Markdown, the R package rmarkdown does the heavy lifting of going from the source Rmd file to the the output, using knitr for evaluating the code chunks.
–> With Quarto, on the other hand, the Quarto command line interface, or the Quarto CLI, does the orchestration of processing executable code chunks with either knitr or jupyter and then converting the resulting markdown file to the desired output.
While this is technically impressive, I’ll be honest, it’s not exactly what sparked my interest in Quarto in the first place.
Quarto makes moving between formats straightforward
.pull-left[ Document
Presentation
].pull-right[ Website
]
???
What did spark my interest was how how straightforward it is with Quarto to move between output formats.
As an educator, two things are of utmost importance to me about the tools I use to create my materials: reproducibility and ease of transition between output formats, like documents to slides to websites to books.
Over the last year of using Quarto for pretty much everything, I’ve felt like I’ve finally found the tool that lets me go from one output type to the other with minimal, if any, futzing around with my source code beyond the yaml. For example, here are common things I produce
–> a lesson in document form
–> the same content in presentation form
–> the same content on a page in a website,
and you can see that all that needed to change going between these formats is a few lines in the yaml. Nothing in the content part of my document. No slide breaks to remove, no citation style to change, no headings to re-level. This ease of transition has freed up time to focus my time on content, and that, folks, is the dream!
Welcome to Quarto!
Formatting the text in your document
Markdown is a markup language. This means special symbols and formatting to pretty documents.
*text* – makes italics
**text** – makes bold text
# – makes headers
 – includes images or HTML links
< > – embeds URLs
#|
???
When you click Render, here is what happens: